Efficient learning and evaluation of complex concepts in inductive logic programming
نویسنده
چکیده
Inductive Logic Programming (ILP) is a subfield of Machine Learning with foundations in logic programming. In ILP, logic programming, a subset of first-order logic, is used as a uniform representation language for the problem specification and induced theories. ILP has been successfully applied to many real-world problems, especially in the biological domain (e.g. drug design, protein structure prediction), where relational information is of particular importance. The expressiveness of logic programs grants flexibility in specifying the learning task and understandability to the induced theories. However, this flexibility comes at a high computational cost, constraining the applicability of ILP systems. Constructing and evaluating complex concepts remain two of the main issues that prevent ILP systems from tackling many learning problems. These learning problems are interesting both from a research perspective, as they raise the standards for ILP systems, and from an application perspective, where these target concepts naturally occur in many real-world applications. Such complex concepts cannot be constructed or evaluated by parallelizing existing top-down ILP systems or improving the underlying Prolog engine. Novel search strategies and cover algorithms are needed. The main focus of this thesis is on how to efficiently construct and evaluate complex hypotheses in an ILP setting. In order to construct such hypotheses we investigate two approaches. The first, the Top Directed Hypothesis Derivation framework, implemented in the ILP system TopLog, involves the use of a top theory to constrain the hypothesis space. In the second approach we revisit the bottom-up search strategy of Golem, lifting its restriction on determinate clauses which had rendered Golem inapplicable to many key areas. These developments led to the bottom-up ILP system ProGolem. A challenge that arises with a bottom-up approach is the coverage computation of long, non-determinate, clauses. Prolog’s SLD-resolution is no longer adequate. We developed a new, Prolog-based, theta-subsumption engine which is significantly more efficient than SLD-resolution in computing the coverage of such complex clauses. We provide evidence that ProGolem achieves the goal of learning complex concepts by presenting a protein-hexose binding prediction application. The theory ProGolem induced has a statistically significant better predictive accuracy than that of other learners. More importantly, the biological insights ProGolem’s theory provided were judged by domain experts to be relevant and, in some cases, novel.
منابع مشابه
An Inductive Logic Programming Query Language for Database Mining
First, a short introduction to inductive logic programming and machine learning is presented and then an inductive database mining query language RDM (Relational Database Mining language). RDM integrates concepts from inductive logic programming, constraint logic programming, deductive databases and meta-programming into a flexible environment for relational knowledge discovery in databases. Th...
متن کاملLearning functional logic classification concepts from databases
In this paper we address the possibilities, advantages and shortcomings of addressing different data-mining problems with the Inductive Functional Logic Programming (IFLP) paradigm. As a functional extension of the Inductive Logic Programming (ILP) approach, IFLP has all the advantages of the latter but the potential of a more natural representation language for classification, clustering and f...
متن کاملInducing Relational Concepts with Neural Networks via the LINUS System
This paper presents a method to induce relational concepts with neural networks using the inductive logic programming system LINUS. Some first-order inductive learning tasks taken from machine learning literature were applied successfully, thus confirming the quality of the hypothesis generated by neural networks.
متن کاملLearning Complex Mappings between Ontologies
In this paper, we introduce a new approach for constructing complex mappings between ontologies by transforming it to a rule learning process. Derived from the classical Inductive Logic Programming, our approach uses instance mappings as training data and employs tailoring heuristics to improve the learning efficiency. Empirical evaluation shows that our generated Horn-rule mappings are meaning...
متن کاملAutonomous Discovery of Abstract Concepts by a Robot
In this paper we look at the discovery of abstract concepts by a robot autonomously exploring its environment and learning the laws of the environment. By abstract concepts we mean concepts that are not explicitly observable in the measured data, such as the notions of obstacle, stability or a tool. We consider mechanisms of machine learning that enable the discovery of abstract concepts. Such ...
متن کامل